{
    "cells": [
        {
            "cell_type": "markdown",
            "source": [
                "# 10 minutes to Optimus\r\n",
                "\r\n",
                "**👋 Hi, Are you in Google Colab?**\r\n",
                "\r\n",
                "In Google Colab you can easily run Optimus. If you're not, you may want visit the link below\r\n",
                "\r\n",
                "https://colab.research.google.com/github/hi-primus/optimus/blob/master/examples/10_min_to_optimus.ipynb"
            ],
            "metadata": {}
        },
        {
            "cell_type": "markdown",
            "source": [
                "## Install Optimus"
            ],
            "metadata": {}
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "source": [
                "!pip install git+https://github.com/hi-primus/optimus.git@develop-21.8#egg=pyoptimus[pandas]"
            ],
            "outputs": [],
            "metadata": {}
        },
        {
            "cell_type": "markdown",
            "source": [
                "## Restart Runtime\r\n",
                "Before you continue, please go to the 'Runtime' Menu above, and select 'Restart Runtime (Ctrl + M + .)'."
            ],
            "metadata": {}
        },
        {
            "cell_type": "markdown",
            "source": [
                "**🎉 You are done. Enjoy Optimus!**\r\n",
                "## Import Optimus and start it"
            ],
            "metadata": {}
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "source": [
                "from optimus import Optimus\r\n",
                "op = Optimus(\"pandas\")"
            ],
            "outputs": [],
            "metadata": {}
        },
        {
            "cell_type": "markdown",
            "source": [
                "## Dataframe creation\r\n",
                "\r\n",
                "Create a dataframe to passing a list of values for each column."
            ],
            "metadata": {}
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "source": [
                "df = op.create.dataframe({\r\n",
                "        \"words\": [\"  I like     fish  \", \"    zombies\", \"simpsons   cat lady\", None],\r\n",
                "        \"num\": [1, 2, 2, 3],\r\n",
                "        \"animals\": [\"dog\", \"cat\", \"frog\", \"eagle\"],\r\n",
                "        \"thing\": [\"housé\", \"tv\", \"table\", \"glass\"],\r\n",
                "        \"two strings\": [\"cat-car\", \"dog-tv\", \"eagle-tv-plus\", \"lion-pc\"],\r\n",
                "        \"filter\": [\"a\", \"b\", \"1\", \"c\"],\r\n",
                "        \"num 2\": [\"1\", \"2\", \"3\", \"4\"],\r\n",
                "        \"col_array\": [[\"baby\", \"sorry\"], [\"baby 1\", \"sorry 1\"], [\"baby 2\", \"sorry 2\"], [\"baby 3\", \"sorry 3\"]],  \r\n",
                "        \"col_int\": [[1, 2, 3], [3, 4], [5, 6, 7], [7, 8]]\r\n",
                "})\r\n",
                "\r\n",
                "df.display()"
            ],
            "outputs": [],
            "metadata": {}
        },
        {
            "cell_type": "markdown",
            "source": [
                "Creating a dataframe by passing a list of tuples specifyng the column data type."
            ],
            "metadata": {}
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "source": [
                "\r\n",
                "df = op.create.dataframe({\r\n",
                "        (\"words\", \"str\"): [\"  I like     fish  \", \"    zombies\", \"simpsons   cat lady\", None],\r\n",
                "        (\"num\", \"int\"): [1, 2, 2, 3],\r\n",
                "        (\"animals\", \"str\"): [\"dog\", \"cat\", \"frog\", \"eagle\"],\r\n",
                "        (\"thing\", \"str\"): [\"housé\", \"tv\", \"table\", \"glass\"],\r\n",
                "        (\"two strings\", \"str\"): [\"cat-car\", \"dog-tv\", \"eagle-tv-plus\", \"lion-pc\"],\r\n",
                "        (\"filter\", \"str\"): [\"a\", \"b\", \"1\", \"c\"],\r\n",
                "        (\"num 2\", \"string\"): [\"1\", \"2\", \"3\", \"4\"],\r\n",
                "        \"col_array\": [[\"baby\", \"sorry\"], [\"baby 1\", \"sorry 1\"], [\"baby 2\", \"sorry 2\"], [\"baby 3\", \"sorry 3\"]],                \r\n",
                "        \"col_int\": [[1, 2, 3], [3, 4], [5, 6, 7], [7, 8]]\r\n",
                "})\r\n",
                "\r\n",
                "df.display()"
            ],
            "outputs": [],
            "metadata": {}
        },
        {
            "cell_type": "markdown",
            "source": [
                "Creating an Optimus dataframe using a pandas dataframe"
            ],
            "metadata": {}
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "source": [
                "import pandas as pd\r\n",
                "\r\n",
                "data = [(\"bumbl#ebéé  \", 17.5, \"Espionage\", 7),\r\n",
                "        (\"Optim'us\", 28.0, \"Leader\", 10),\r\n",
                "        (\"ironhide&\", 26.0, \"Security\", 7)]\r\n",
                "\r\n",
                "labels = [\"names\", \"height\", \"function\", \"rank\"]\r\n",
                "\r\n",
                "pdf = pd.DataFrame.from_records(data, columns=labels)\r\n",
                "\r\n",
                "df = op.create.dataframe(dfd=pdf)\r\n",
                "\r\n",
                "df.display()"
            ],
            "outputs": [],
            "metadata": {}
        },
        {
            "cell_type": "markdown",
            "source": [
                "# Dataframe loading"
            ],
            "metadata": {}
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "source": [
                "df = op.load.file(\"https://raw.githubusercontent.com/hi-primus/optimus/develop-21.8/examples/data/foo.csv\")\r\n",
                "df.display()"
            ],
            "outputs": [],
            "metadata": {}
        },
        {
            "cell_type": "markdown",
            "source": [
                "## Viewing data\r\n",
                "Here is how to view the first 20 elements in a dataframe"
            ],
            "metadata": {}
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "source": [
                "df.display(20)"
            ],
            "outputs": [],
            "metadata": {}
        },
        {
            "cell_type": "markdown",
            "source": [
                "Display in plain text using print"
            ],
            "metadata": {}
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "source": [
                "df.print(5)"
            ],
            "outputs": [],
            "metadata": {}
        },
        {
            "cell_type": "markdown",
            "source": [
                "# Transforming data\r\n",
                "To transform data you can use operations like `upper` to transform the text data to uppercases or `rename` to rename a column."
            ],
            "metadata": {}
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "source": [
                "df.display()\r\n",
                "df.cols.rename(\"firstName\", \"name\").display(highlight=\"name\")\r\n",
                "df.cols.upper(\"lastName\").display(highlight=\"lastName\")"
            ],
            "outputs": [],
            "metadata": {}
        },
        {
            "cell_type": "markdown",
            "source": [
                "## Chaining\r\n",
                "\r\n",
                "The past transformations were done step by step, but this can be achieved by chaining all operations into one line of code, like the cell below."
            ],
            "metadata": {}
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "source": [
                "df.display()\r\n",
                "df \\\r\n",
                "    .cols.rename(\"billingId\", \"billing\") \\\r\n",
                "    .cols.drop([\"id\", \"dummyCol\"]) \\\r\n",
                "    .cols.append({\"zeros\": 0}) \\\r\n",
                "    .cols.sort(order=\"desc\") \\\r\n",
                "    .cols.upper(\"product\") \\\r\n",
                "    .display()"
            ],
            "outputs": [],
            "metadata": {}
        },
        {
            "cell_type": "markdown",
            "source": [
                "## More examples\r\n",
                "\r\n",
                "Delete repeated rows"
            ],
            "metadata": {}
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "source": [
                "df.rows.drop_duplicated(\"product\").display()"
            ],
            "outputs": [],
            "metadata": {}
        },
        {
            "cell_type": "markdown",
            "source": [
                "Replace repeated values"
            ],
            "metadata": {}
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "source": [
                "df.set.duplicated(\"product\", \"N/A\").display(highlight=\"product\")"
            ],
            "outputs": [],
            "metadata": {}
        },
        {
            "cell_type": "markdown",
            "source": [
                "Profile of the dataframe"
            ],
            "metadata": {}
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "source": [
                "df.profile(\"*\", bins=3) # \"*\" = select all columns"
            ],
            "outputs": [],
            "metadata": {}
        }
    ],
    "metadata": {
        "orig_nbformat": 4,
        "language_info": {
            "name": "python"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 2
}